Enhancer transcription detected in the nascent transcriptomic landscape of bread wheat

The precise spatiotemporal gene expression is orchestrated by enhancers that lack general sequence features and thus are difficult to be computationally identified. By nascent RNA sequencing combined with epigenome profiling, we detect active transcription of enhancers from the complex bread wheat genome. We find that genes associated with transcriptional enhancers are expressed at significantly higher levels, and enhancer RNA is more precise and robust in predicting enhancer activity compared to chromatin features. We demonstrate that sub-genome-biased enhancer transcription could drive sub-genome-biased gene expression. This study highlights enhancer transcription as a hallmark in regulating gene expression in wheat. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02675-1.

. Detect wheat transcription by nascent RNA sequencing methods. a, GRO-seq reads aligned to different Arabidopsis genomic regions. TSS1K indicates the 1 kb upstream region of the gene transcription start site. TES1K indicates the 1 kb downstream region of the gene transcription end site/polyadenylation site. b, Genomic coverage of ssRNA-seq, GRO-seq and pNET-seq with different read counts in wheat, human, maize and Arabidopsis. c, GRO-seq and pNET-seq signals along annotated genic regions. a, Correlation of read density of transcription clusters from genic and intergenic regions detected by GRO-seq and pNET-seq, respectively. b, Venn of intergenic transcription clusters detected by GRO-seq and pNET-seq.

DHS
Pol II H3K9ac H3K4me3 H3K36me3 H3K27ac H3K4me1 GRO Figure S5. Read densities of GRO-seq, DNase-seq and ChIP-seq of Pol II, H3K9ac, H3K4me3, H3K36me3, H3K27ac, and H3K4me1 around the intergenic and genic transcription clusters (TCs, ± 3kb). All intergenic and genic TCs were ranked in a descending order of GRO-seq signals (±250 bp around the 5' end), and chromatin features were plotted around the 5' end of each intergenic and genic TCs.

Intergenic
Genic DHS Gene (pNET) Figure S6. Chromatin states around the genic transcription clusters (TCs) by pNET-seq. Genic TCs were divided into ten equal parts based on the decreasing level of pNET-seq signals (± 250 bp). Read densities of DNase-seq and ChIP-seq of H3K9ac, H3K4me3, H3K36me3, H3K27ac, and H3K4me1 around each of the ten parts of genic TCs.       Figure S14. Reporter constructs for enhancer activity measurement. a, Expression cassettes using gfp (green fluorescent protein) as reporter. b, Expression cassettes using Rluc (Renilla luciferase) and Fluc (firefly luciferase) as dual reporters. c, A formula for calculating enhancer activity based on luciferase activity. Blank control, mini 35S pro, a minimal cauliflower mosaic promoter. Negative control, an intergenic region, where there were no DHS, pNET-seq, and GRO-seq signals, fused with the mini 35S pro. Text group, enhancer candidate region fused with the mini 35S pro. For each experiment, Rluc driven by a UBQ10 promoter was used to monitor transfection efficiency in the same vector via the dual-luciferase assays, and the relative expression level was defined as the ratio of Fluc to Rluc. The relative expression level of the test group was normalized to that of the blank control, yielding enhancer activity (relative intensity).
Relative intensity ≥ 2.0 was set as the cutoff for the positive enhancer that activated reporter expression. In most cases, there is a copy of a homeolog gene for each of the three subgenome homeologous sites. These three genes are referred to as a triad. For the sake of correlation analysis, we focused on 67,108 pairs of homoeolog genes that have 1:1 correspondence between any two of the three subgenomes.
Taking the homeolog genes from A and B subgenomes as example, gene pairs can be divided into four scenario: I, homeolog A is associated with transcribed enhancer(s), homeolog B is not; II, homeolog B is associated with transcribed enhancer(s), homeolog A is not; III, both of homeolog A and B are associated with transcribed enhancer(s); IV, neither of homeolog A and B is associated with a transcribed enhancer. We named scenario I as "A W, B W/O" and scenario II-IV as "other". (In the case that homeolog B is associated with at least one transcribed enhancer, while homeolog A is not, the same statistics were performed. Since A and B are symmetric, only the former case is discussed below.) Based on the expression level, all the gene pairs can be divided into: "A > B" and "A <= B". Thus, all the gene pairs were divided into four quadrants, z1 (red dots, "A W, B W/O" ∩ "A > B"), z2 (grey dots, "A W, B W/O" ∩ "A <= B"), x, and y.
We used the odds ratio (OR) to measure how strongly "A being expressed higher than B" is associated with "A is associated with transcribed enhancer(s) but B is not". The odds ratio is a ratio of two sets of odds: odds1= A being expressed higher than B when A is associated with transcribed enhancer(s) but B is not (z1) A being expressed no higher than B when A is associated with transcribed enhancer(s) but B is not (z2) odds2= A being expressed higher than B when A is not associated with transcribed enhancer(s) (x) A being expressed no higher than B when A is not associated with transcribed enhancer(s) (y) The formula for calculating the odds ratio (OR) are listed aside.
OR > 1 (red dots, odds1/odds2) means that "A is associated with transcribed enhancer(s) but B is not" increase the likelihood of "homeolog A being expressed higher than homeolog B"; while OR < 1 (grey dots, odds2/odds1) means that "A is associated with transcribed enhancer(s) but B is not" decrease the likelihood of "homeolog A being expressed no higher than homeolog B".
Taken together, it is suggested that biased expression of enhancers associated with homeolog A and B could be a reason for asymmetric expression of homeolog A and B themselves.